Best Inference Length AI Tools & Models - Premium Inference Length News

AI News

Mistral AI Launches Mistral 3 Series Open Source Models: 128K Context, Runs on a Single A100, Pricing Compared to Half of GPT-4o

Mistral AI launches Mistral3 series models, including 3B, 8B, 14B dense models and top-tier Mistral Large3, covering edge to enterprise inference. Open-sourced under Apache2.0, free for commercial use, with 128K context length and performance rivaling Llama3.1 in benchmarks.....

11.9k 7 hours ago

Mistral AI Launches Mistral 3 Series Open Source Models: 128K Context, Runs on a Single A100, Pricing Compared to Half of GPT-4o

Silicon Base Flow Updates DeepSeek-R1 and Other Inference Model APIs to Support a 128K Context Length

No description available

11k 2 days ago

Silicon Base Flow Updates DeepSeek-R1 and Other Inference Model APIs to Support a 128K Context Length

SambaNova Launches Intelligent AI Chip SN40L, Capable of Running 50 Trillion Parameter Models

SambaNova has launched its intelligent AI chip SN40L, which can run models with up to 50 trillion parameters while maintaining model accuracy. The SN40L chip is manufactured by TSMC and provides over 256k sequence length for a single system node, enhancing model quality and inference speed through integrated technology. SambaNova's full-stack large language model (LLM) platform will be powered by the SN40L chip, addressing the challenges enterprises face in deploying generative artificial intelligence. The uniqueness of the SN40L chip lies in its simultaneous processing capabilities.

9.8k 22 minutes ago

Models

Grok 4 Fast

Xai

$1.4

Input tokens/M

$3.5

Output tokens/M

Context Length

o3-mini

Openai

$7.7

Input tokens/M

$30.8

Output tokens/M

200

Context Length

Claude Haiku 4.5

Anthropic

Input tokens/M

$35

Output tokens/M

200

Context Length

Gemini 2.5 Flash

Google

$2.1

Input tokens/M

$17.5

Output tokens/M

Context Length

Gemini 2.5 Flash-Lite

Google

$0.7

Input tokens/M

$2.8

Output tokens/M

Context Length

Qianfan-Lightning

Baidu

Input tokens/M

Output tokens/M

128

Context Length

Qwen3-Next-80B-A3B-Instruct

Alibaba

Input tokens/M

Output tokens/M

256

Context Length

Doubao-1.5-pro-32k

Bytedance

$0.8

Input tokens/M

Output tokens/M

128

Context Length

Hunyuan-T1-20250822

Tencent

Input tokens/M

Output tokens/M

Context Length

Hunyuan-T1-latest

Tencent

Input tokens/M

Output tokens/M

Context Length

DeepSeek-V3.1

Deepseek

Input tokens/M

$12

Output tokens/M

128

Context Length

Qwen3-1.7B

Alibaba

Input tokens/M

Output tokens/M

Context Length

gpt-oss-20b

Openai

$0.4

Input tokens/M

Output tokens/M

128

Context Length

Qwen3-30B-A3B-Instruct-2507

Alibaba

$0.75

Input tokens/M

Output tokens/M

256

Context Length

GPT-5 nano

Openai

$0.35

Input tokens/M

$2.8

Output tokens/M

400

Context Length

Qwen3-235B-A22B-Instruct-2507

Alibaba

Input tokens/M

Output tokens/M

Context Length

Pangu-NLP-N2-128K-5.0.1.1

Huawei

Input tokens/M

Output tokens/M

128

Context Length

Pangu-NLP-N2-32K-5.0.1.1

Huawei

Input tokens/M

Output tokens/M

Context Length

GLM-4.5-Flash

Chatglm

Input tokens/M

Output tokens/M

128

Context Length

GLM-4.5-AirX

Chatglm

Input tokens/M

Output tokens/M

128

Context Length

Empowering the future, your artificial intelligence solution think tank

English 简体中文繁體中文にほんご

FirendLinks:

AI Newsletters AI Tools MCP Servers AI News AIBase LLM Leaderboard AI Ranking

Business Cooperation Site Map

AI News

Mistral AI Launches Mistral 3 Series Open Source Models: 128K Context, Runs on a Single A100, Pricing Compared to Half of GPT-4o

Silicon Base Flow Updates DeepSeek-R1 and Other Inference Model APIs to Support a 128K Context Length

SambaNova Launches Intelligent AI Chip SN40L, Capable of Running 50 Trillion Parameter Models

Models

Grok 4 Fast

o3-mini

Claude Haiku 4.5

Gemini 2.5 Flash

Gemini 2.5 Flash-Lite

Qianfan-Lightning

Qwen3-Next-80B-A3B-Instruct

Doubao-1.5-pro-32k

Hunyuan-T1-20250822

Hunyuan-T1-latest

DeepSeek-V3.1

Qwen3-1.7B

gpt-oss-20b

Qwen3-30B-A3B-Instruct-2507

GPT-5 nano

Qwen3-235B-A22B-Instruct-2507

Pangu-NLP-N2-128K-5.0.1.1

Pangu-NLP-N2-32K-5.0.1.1

GLM-4.5-Flash

GLM-4.5-AirX

MiniMax M2 AWQ

NanoAgent 135M

Qwen3 Next 80B A3B Instruct AWQ 8bit

Megrez2 3x7B A3B GGUF

Qwen3 Next 80B A3B Instruct Bnb 4bit

Qwen3 Next 80B A3B Thinking AWQ 4bit

MobileLLM Pro

MiniCPM4.1 8B GGUF

Kimi K2 Instruct 0905 HQ4_K

NVIDIA Nemotron Nano 12B V2 GGUF

Schematron 8B

Cogito V2 Preview Deepseek 671B MoE FP8

Llama 3_3 Nemotron Super 49B V1_5 AWQ

Llama 3_3 Nemotron Super 49B V1_5 AWQ 4bit

Qwen3 235B A22B Thinking 2507 AWQ

Llama 3_3 Nemotron Super 49B V1_5 GGUF

Llama 3_3 Nemotron Super 49B V1_5

GTA1 72B

Qwen3 30B A1.5B High Speed GGUF

Phi 3.5 Mini Instruct